-
Notifications
You must be signed in to change notification settings - Fork 25
fix: replace copy with deepcopy in Mooncake pullbacks #723
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #723 +/- ##
==========================================
- Coverage 97.99% 97.98% -0.01%
==========================================
Files 121 121
Lines 6341 6363 +22
==========================================
+ Hits 6214 6235 +21
- Misses 127 128 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Largely LGTM. Would it make sense to add a regression test which checks that the new copying mechanism works for parameters which are e.g. Tuple
s / NamedTuple
s ?
I would like to ensure the absence of regression more systematically by using test scenarios which involve tuples or nested structs. One option is to define new ones (#724). Another option would be to implement the necessary standardization so that Mooncake can be tested on Flux and Lux scenarios (no idea how complicated that would be). A third option would be to simply check that Mooncake runs on these neural net scenarios, without ensuring correctness because comparison to the reference gradient requires standardization. Thoughts? |
Hmmm interesting. Is it documented anywhere what standardisations must be applied in order to support Flux / Lux? Additionally, do the two frameworks agree? I'd be keen to get Mooncake tested on them as part of DI / Mooncake's own tests. I don't want to slow down this PR though, so perhaps just merge this, and we can discuss elsewhere? |
Not documented, but these standardizations are necessary to compare the gradient returned by different backends. But I've never even tried Mooncake on such scenarios so maybe it does the right thing / agrees with Zygote out of the box. Let's merge this and discuss further in #724 |
copy
withdeepcopy
when returning general tangents to fix How to use Mooncake in Lux chalk-lab/Mooncake.jl#467. Add a shortcut for numbers and arrays to avoid paying the price ofdeepcopy
.Mooncake.value_and_gradient!!
forDI.gradient
. The same reasoning applies for copying before returning.f::F
everywhere to force specialization.@willtebbutt does this look good?